A scoring function development for protein structure classification based on sequence and structure information

نویسنده

  • Ye Tian
چکیده

This project is to combine sequence and structure information of proteins to set up scoring function to classify protein structures whose belongings are unknown. The scoring function is basically explained as distance between two proteins. We began this project with searching useful informations and construct geometric and topological representations and distance metrics of those useful informations. In this paper, we used sequence information, residual number information,and Amino Acid information. Then through learning from 40 classified proteins and linear programming, we try to determine the weights for each piece of informations. Ideally, those ”good” pieces of information are supposed to win higher weights and less good ones have lower weights. The scoring function with weights determined can be used to predict classification of other proteins whose structures are unknown.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches

DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...

متن کامل

In silico Analysis and Molecular Modeling of RNA Polymerase, Sigma S (RpoS) Protein in Pseudomonas aeruginosa PAO1

Background: Sigma factors are proteins that regulate transcription in bacteria. Sigma factors can be activated in response to different environmental conditions. The rpoS (RNA polymerase, sigma S) gene encodes sigma-38 (σ38, or RpoS), a 37.8 kDa protein in Pseudomonas aeruginosa (P. aeruginosa) strains. RpoS is a central regulator of the general stress response and operates in both retroa...

متن کامل

Application of a simple likelihood ratio approximant to protein sequence classification

MOTIVATION Likelihood ratio approximants (LRA) have been widely used for model comparison in statistics. The present study was undertaken in order to explore their utility as a scoring (ranking) function in the classification of protein sequences. RESULTS We used a simple LRA-based on the maximal similarity (or minimal distance) scores of the two top ranking sequence classes. The scoring meth...

متن کامل

In Silico Analysis of Primary Sequence and Tertiary Structure of Lepidium Draba Peroxidase

Peroxidase enzymes are vastly applicable in industry and diagnosiss. Recently, we introduced a new kind of peroxidase gene from Lepidium draba (LDP). According to protein multiple sequence alignment results, LDP had 93% similarity and 88.96% identity with horseradish peroxidase C1A (HRP C1A). In the current study we employed in silico tools to determine, to which group of peroxidase enzymes LDP...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008